Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PULP-276] Add support for prns #3864

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pedro-psb
Copy link
Member

@pedro-psb pedro-psb commented Jan 21, 2025

Notes:

  • The copy API used to check if objects belonged to the current domain. For PRN references, we'll have to do it in the task context to avoid response timeouts.

@github-actions github-actions bot added multi-commit Added when a PR consists of more than one commit no-changelog no-issue labels Jan 21, 2025
@pedro-psb pedro-psb changed the title Add support for prns [PULP-276] Add support for prns Jan 23, 2025
@pedro-psb pedro-psb force-pushed the add-support-for-prns branch 2 times, most recently from f0697ee to 3e948d0 Compare January 23, 2025 18:38
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from the PR adding PRN to pulpcore: pulp/pulpcore#5813.
It did catch the DistributionTree case, where it did not had prn specified.

@pedro-psb pedro-psb force-pushed the add-support-for-prns branch from 3e948d0 to 0664071 Compare January 23, 2025 18:45
@pedro-psb pedro-psb linked an issue Jan 24, 2025 that may be closed by this pull request
@pedro-psb
Copy link
Member Author

I've ran a few experiments after on-demand syncing >10,000 package from centos and here are some conclusions:

  1. The distinct query suggested by @dralley is better! [0]
  2. The extract_pk performs very poorly with pulp_hrefs because of the use of django's resolve. [1]

[0] Query comparision w/ list of 10,000 packages prns

In [54]: timeit(lambda: Content.objects.filter(pk__in=(extract_pk(n) for n in prns)).values("pulp_domain").distinct(), number=10)
Out[55]: 0.17054506699787453

In [56]: timeit(lambda: Content.objects.filter(pk__in=(extract_pk(n) for n in prns),pulp_domain=domain).count(), number=10)
Out[57]: 0.6148349700015387

[1] Very bad performance with extract_pk from pulp_href

In [34]: timeit(lambda: [extract_pk(n) for n in prns], number=10)
Out[34]: 0.023607832001289353

In [35]: timeit(lambda: [extract_pk(n) for n in hrefs], number=10)
Out[35]: 22.53728126399801

In [36]: len(prns)
Out[36]: 10000

In [37]: len(hrefs)
Out[37]: 10000

@dralley
Copy link
Contributor

dralley commented Jan 27, 2025

Great!

If we can do all validation in the view layer with clever queries, that would be greatly preferable to deferring it to the task.

If we can reimplement extract_pk such that it doesn't use Django's resolve(), but still do the validation we want to do, that would be ideal.

@pedro-psb pedro-psb force-pushed the add-support-for-prns branch from f4e96cb to 0f60c2c Compare January 28, 2025 16:44
* Add cross_domain validation for PRNs on copy API serializer. The new
  validation doesn't capture the first href/prn that failed anymore,
  as it relies on querying distinct domains in the bulk of references.

Fixes: pulp#3853
@pedro-psb pedro-psb force-pushed the add-support-for-prns branch from 0f60c2c to 8a7fc2a Compare January 28, 2025 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-commit Added when a PR consists of more than one commit no-changelog no-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Advanced copy API needs PRN support
2 participants